Data analysis apparatus, data analysis method, and data analysis program

ABSTRACT

An object of the invention is to harmonize prediction accuracy and an analysis time of an ensemble model. Therefore, when performing data analysis using an ensemble model 300 that makes an inference by integrating inferences by first to n-th models, an i-th model (1≤i≤n) constituting the ensemble model 300 is selected from an i-th model group of the model data, at least one model group of the first to n-th model groups includes a plurality of models, and the first to n-th models capable of constituting an ensemble model satisfying a performance requirement for data analysis and a constraint requirement for time required for the data analysis are selected from the first to n-th model groups 301 to 303.

TECHNICAL FIELD

The present invention relates to a data analysis apparatus, a dataanalysis method, and a data analysis program.

BACKGROUND ART

In order for a person to freely move his/her body, locomotive organsmade up of bones, joints, muscles and nerves need to function normally.Locomotive syndrome (“locomo”) refers to a condition in which one ormore locomotive organs are impaired and movement functions such asstanding, walking, running, and sitting are declined. When such adecline in the movement functions progresses, a trouble occurs even in adaily life. It is said that locomotor disorders that require a hospitaltreatment usually occur after an age of 50, and locomotor disorders inthe elder lead to a risk of needing support or care. Since the locomotordisorders progress gradually, a need for prevention, early detection,and appropriate coping of the locomo is recognized. Patent Literature 1discloses a walking mode analysis apparatus that measures a walkingstate of a measurement subject, calculates feature amount data from ameasurement result, and analyzes a walking mode of the measurementsubject using calculated feature amount data and an analysis model.

In Patent Literature 2, in constructing a prediction model, candidatesfor preprocessing of input data, a data learning method based on ahyperparameter, and the like are set in advance, and a pipeline capableof constructing a prediction model with higher prediction accuracy isselected from combinations (referred to as pipelines) of thesecandidates. A search is performed using sample data extracted at apredetermined ratio from learning data so that time required for asearch for the pipeline does not increase even when the number ofcandidates increases, the extraction ratio of the sample data isincreased as long as processing time does not exceed a time limit, and acombination in which the prediction accuracy of the prediction model ishigh is searched for.

CITATION LIST Patent Literature

-   PTL 1: Japanese Patent No. 6509406-   PTL 2: JP-A-2018-190130

SUMMARY OF INVENTION Technical Problem

A decline in movement functions of a person is represented as a gaitdisorder. It is effective to know a walking state of the person, whichpromotes early detection and remission of the locomo, and to inform asubject in an easy-to-understand manner. From a viewpoint of preventionor early detection of a locomotor disorder, it is desirable that theanalysis apparatus as disclosed in Patent Literature 1 is provided notonly in a medical institution but also in a fitness gym or the like, andeven a measurement subject who is unaware of the locomotor disorder caneasily be aware of his/her walking state.

However, the more precisely and accurately a walking mode is analyzed,the more enormous the number of feature amount data used for analysisis, and time required for calculation of the feature amount data and theanalysis using the feature amount data also increases. When it takes along waiting time to obtain an analysis result, the waiting time may beavoided by the measurement subject who is unaware of a locomotordisorder. In particular, when the analysis apparatus is provided in aplace close to the measurement subject, it is desirable to calculate thefeature amount data and analyze the walking state by a personal computer(PC) or the like that is generally used, and it cannot be assumed that acomputer with particularly high computing capability is used.

Patent Literature 2 discloses shortening a search time for pipelineselection for construction of a prediction model, and does not refer totime required for analysis using the prediction model.

Solution to Problem

A data analysis apparatus according to an embodiment of the inventionperforms data analysis using an ensemble model that makes an inferenceby integrating inferences by first to n-th models. The data analysisapparatus includes: a processor; a memory; a storage; and a dataanalysis program read into the memory and executed by the processor. Thestorage stores model data in which first to n-th model groups eachincluding one or more models are registered, an i-th model (1≤i≤n)constituting the ensemble model is selected from an i-th model group ofthe model data, at least one model group of the first to n-th modelgroups includes a plurality of models, and the data analysis programincludes: an ensemble model creation processing unit configured topresent, from the respective first to n-th model groups, options of thefirst to n-th models capable of constituting the ensemble modelsatisfying a performance requirement for data analysis and a constraintrequirement for time required for the data analysis; and an ensembleanalysis processing unit configured to receive selection of thepresented options of the first to n-th models and make an inference bythe ensemble model using the selected first to n-th models.

Advantageous Effect

In an analysis using an ensemble model, prediction accuracy and ananalysis time of the ensemble model are harmonized.

Other technical problems and novel characteristics will be apparent fromthe description and the accompanying drawings.

BRIEF DESCRIPTION OF DRAWINGS

FIG. 1 shows a hardware configuration of a data analysis system.

FIG. 2 shows a software configuration of the data analysis system.

FIG. 3 is a schematic diagram of an ensemble model.

FIG. 4 shows an example of analysis setting data.

FIG. 5 shows an example of domain knowledge data.

FIG. 6 shows a processing flow for analyzing a walking mode of ameasurement subject.

FIG. 7 shows a data structure of measurement data.

FIG. 8 shows a data structure of feature amount data.

FIG. 9 shows a data structure of prediction result data.

FIG. 10 shows an evaluation flow of a model (weak recognizer).

FIG. 11 shows a data structure of model data.

FIG. 12 shows a flow for selecting a model (weak recognizer) andselecting a feature amount to be calculated.

FIG. 13 shows an ensemble model determination flow.

FIG. 14 shows a data structure of ensemble model data.

FIG. 15 shows a data structure of selected feature amount data.

DESCRIPTION OF EMBODIMENTS

FIG. 1 shows a hardware configuration of a data analysis system 110 thatanalyzes a walking mode of a pedestrian. The data analysis system 110includes a sensor 111 that measures walking of a measurement subject,and a data analysis apparatus 100 that measures the walking of themeasurement subject using the sensor 111 and analyzes a walking modefrom a measurement result.

The data analysis apparatus 100 includes a central processing unit (CPU)101, an input interface (I/F) 102, an output I/F 103, a memory 104, astorage 105, and an I/O port 106, which are connected by an internal bus107. The data analysis apparatus 100 is an information processingapparatus that can be implemented by a general-purpose computer. Theinput I/F 102 is connected to an input device such as a keyboard or amouse, and the output I/F 103 is connected to a display or a printer toimplement a graphical user interface (GUI) for an operator. The storage105 usually includes a nonvolatile memory such as a HDD, a SSD, a ROM,or a flash memory, and stores a program to be executed by the dataanalysis apparatus 100, data to be processed by the program, and thelike. The memory 104 includes a random access memory (RAM), andtemporarily stores the program, data necessary for executing theprogram, and the like according to a command of the CPU 101. The CPU 101executes the program loaded from the storage 105 to the memory 104.

The data analysis apparatus 100 issues a collection command of sensingdata to the sensor 111. The sensor 111 senses the walking of themeasurement subject in response to the command and transmits ameasurement result to the data analysis apparatus 100. A distance sensorbased on a time of flight (TOF) method can be used as the sensor 111. Inorder to capture the walking mode of the measurement subject, it isnecessary to measure a movement (trajectory) in a three-dimensionalspace of a measurement point (joint or the like) of a body of themeasurement subject during walking, and the distance sensor has anadvantage that coordinates of the measurement point in thethree-dimensional space can be directly obtained. The sensor 111 is notlimited to the distance sensor and may be a video camera and perform animage analysis from a video obtained by imaging a measurer duringwalking by the video camera. A sensor such as an acceleration sensor, anangle sensor, or a gyro sensor may be used. It is also possible to use aplurality of types of sensors.

FIG. 2 shows a software configuration of the data analysis system 110,and shows programs executed in the data analysis apparatus 100 and arelation between the programs. A data analysis program 200 has afunction of measuring walking and analyzing a walking mode from ameasurement result. A user input-output processing unit 201 is aninterface program by which an operator inputs instructions andinformation to modules 202 to 207. The modules 202 to 207 are programsthat execute functions related to measurement of walking or analysis ofa walking mode, and contents thereof will be described later. A databaseprogram 210 has a function of storing and managing measurement data oran analysis model necessary for the data analysis system 110 in thestorage 105.

In the present embodiment, the walking mode is analyzed using anensemble model. The ensemble model is a model that integrates inferencesby a plurality of models (weak recognizers) into one inference. FIG. 3is a schematic diagram of an ensemble model applied to the presentembodiment. An ensemble model 300 integrates determination results ofthree models (weak recognizers) and determines whether the walking ofthe measurement subject is healthy. The three models includes a healthywalking model that determines whether the walking of the measurementsubject is healthy walking, a first abnormal walking model thatdetermines whether the walking of the measurement subject is abnormalwalking 1, and a second abnormal walking model that determines whetherthe walking of the measurement subject is abnormal walking 2. Each ofthe abnormal walking 1 and 2 is a specific walking state that isregarded as a gait disorder. The ensemble model 300 compares anabnormality degree 1 (probability of the abnormal walking 1) output fromthe first abnormal walking model and an abnormality degree 2(probability of the abnormal walking 2) output from the second abnormalwalking model, sets a larger one thereof as a maximum abnormality degree(probability of abnormal walking), and integrates the maximumabnormality degree and a health degree (probability of healthy walking)output from the healthy walking model to output a degree of healthyperson walking (probability of healthy walking). The plurality of models(weak recognizers) and an integration method thereof shown in FIG. 3 areone example.

It is assumed that the models that constitute the ensemble model 300 andoutput the healthy degree, the abnormality degree 1, and the abnormalitydegree 2 are selected from respective model groups, and at least one ofthe model groups includes a plurality of models. In the example of FIG.3, the model that outputs the health degree can be selected from models1 and 2 registered as a healthy walking model group 301, the model thatoutputs the abnormality degree 1 can be selected from models 3 and 4registered as a first abnormal walking model group 302, and the modelthat outputs the abnormality degree 2 can be selected from models 5 and6 registered as a second abnormal walking model group 303. The dataanalysis system 110 of the present embodiment selects one model from themodels registered in the model groups 301 to 303 in accordance withperformance and constraints required by data analysis, therebyimplementing analysis according to needs of the measurement subject.

Accordingly, in order to adapt the data analysis system 110 to ameasurement subject group having different performances required by dataanalysis and different constraints allowed by data analysis, anadministrator of the data analysis system 110 activates the analysissetting processing unit 206 and registers analysis setting data 213 anddomain knowledge data 217 (see FIG. 2).

FIG. 4 shows an example of the analysis setting data 213. The analysissetting data 213 defines the performance and constraints of the ensemblemodel applied to each measurement subject group. An analysis target 2132indicates the measurement subject group, and the measurement subjectgroup is defined depending on a place where the system 110 is used. Adefinition method is not limited to this example and is freely selected.A performance requirement and a constraint requirement of the ensemblemodel are defined for each analysis target. The performance requirementis defined by a performance index 2133 and a performance threshold 2134.For example, in a case of a measurement subject group (care facility)defined as setting ID 1, the performance requirement is that theperformance index Matthew correlation coefficient (MCC) is 0.2 or more.When defining the performance requirement, it is expected to obtain aresult relatively suitable for needs of each measurement subject groupby using a performance index selected from a plurality of performanceindexes. For example, an index reflecting a required performance and athreshold for the index are set depending on whether the measurementsubject group emphasizes accuracy or reproducibility. On the other hand,the constraint requirement is defined by a time constraint 2135indicating an upper limit of time allowed for data analysis. In thisexample, a measurement subject (setting ID 2) of a fitness gym issubject to a strict constraint on an analysis time, and a measurementsubject (setting ID 3) of a medical facility has no constraint on theanalysis time (setting the time constraint 2135 to a negative valueindicates that no constraint is set).

FIG. 5 shows an example of the domain knowledge data 217. The domainknowledge data 217 defines an important feature amount for eachmeasurement subject group. The feature amount is a feature amountcalculated based on a movement (trajectory), which is measured by thesensor 111, in a three-dimensional space of a measurement point (jointor the like) of a body of the measurement subject during walking, and isa movement, a correlation, or the like of a joint or an axis of themeasurement subject during walking. The domain knowledge data 217 isfeature amount data included in the analysis regardless of a weightingin a prediction model. For example, feature amount data that a doctor ortrainer wants to refer to when explaining an analysis result to themeasurement subject is applicable. The domain knowledge data 217 is alsodefined for each measurement subject group having the same definition asthe analysis setting data 213. In this example, an importance degree2174 is defined for a feature amount registered in a feature amount name2173. For example, in the analysis target “care facility”, a featureamount A has a higher importance degree than a feature amount B(knowledge IDs 1 and 2). When measured feature amounts are narrowed, itis possible to exclude an analysis target having a small value ofimportance degree from calculation targets. In this example, theimportance degree of each feature amount is defined, and ranking of theimportance degree of each feature amount for each measurement subjectgroup may be defined.

A processing flow in which a measurer 610 analyzes a walking mode of ameasurement subject 620 by a PC 600 that is the data analysis apparatus100 will be described with reference to FIG. 6. The PC 600 is disposedin a specific place (a care facility, a fitness gym, a medical facility,or the like) defined as an analysis target. In the PC 600, an ensemblemodel is constructed to satisfy performance requirements and constraintrequirements set for arrangement locations defined in the analysissetting data 213 described above, and feature amount data necessary forthe constructed ensemble model is selected.

When the measurement subject 620 makes a measurement request to themeasurer 610 (S600), the measurer 610 performs a measurement startoperation on the user input-output processing unit 201 (S601). First,the user input-output processing unit 201 issues a measurement startrequest to the data measurement processing unit 202 (S602). The datameasurement processing unit 202 measures the walking of the measurementsubject 620 using the sensor 111 (S603), and stores obtained measurementdata in the storage 105 (S604). FIG. 7 shows a data structure ofmeasurement data 211.

The measurement data 211 is a trajectory of a measurement point of themeasurement subject in the three-dimensional space, and (X, Y, Z)coordinates 2114 of each measurement point for each time indicated by atime stamp 2113 are stored. As the measurement point, a joint or thelike that affects the walking mode is set. A data ID 2111 is an IDassigned to each record included in the measurement data 211, and ameasurement ID 2112 is an ID assigned to each measurement request of themeasurement subject 620.

When the measurement of the walking of the measurement subject 620 ends,the user input-output processing unit 201 issues a feature amountcalculation request to the feature amount calculation processing unit203 (S605). The feature amount calculation processing unit 203 receivesinputs of selected feature amount data 216 for specifying a featureamount to be used for the ensemble model and the measurement data 211 ofthe measurement subject 620 (S606, 607), calculates feature amount data212 specified by the selected feature amount data 216, and stores theobtained feature amount data in the storage 105 (S608). FIG. 8 shows adata structure of the feature amount data 212. In the feature amountdata 212, a feature amount 2122 specified by the selected feature amountdata 216 is stored for each measurement ID 2112.

When the calculation of the feature amount selected by the selectedfeature amount data 216 ends, the user input-output processing unit 201issues an analysis request to the ensemble analysis processing unit 205(S610). The ensemble analysis processing unit 205 receives inputs ofensemble model data 218 and the feature amount data 212 (S611, S612),performs analysis using the ensemble model, stores prediction resultdata 214 (for example, in the example of FIG. 3, a degree of healthyperson walking or a determination result of whether walking based on thedegree of healthy person walking is healthy) in the storage 105 (S613),and presents a result to the measurement subject 620 by displaying theresult on a display or the like (S614). FIG. 9 shows a data structure ofthe prediction result data 214. In the prediction result data 214, aprediction result 2143 for each measurement ID 2112 is stored.

FIG. 6 shows a processing flow in which the PC 600 analyzes the walkingmode under predetermined performance requirements and constraintrequirements. For example, a time zone in which there is no constraintsuch as off-business hours may be set in advance, a feature amount notdesignated in the selected feature amount data 216 may be calculated inthe time zone, and analysis may be performed by the ensemble model usingdifferent models (weak recognizers). Alternatively, the measurement data211 may be transferred to another data analysis apparatus 100, and thewalking mode may be analyzed without a constraint. Further, when themeasurer 610 can diagnose the walking mode of the measurement subject620, a diagnosis result of the walking mode of the measurement subject620 diagnosed by the measurer 610 is tagged as teacher data to themeasurement data 211 or all feature amount data calculated from themeasurement data 211. Accordingly, the measurement data of themeasurement subject can be used as learning data for model relearning.

Before executing the processing flow of FIG. 6, the construction of theensemble model and the selection of the feature amount to be calculatedare performed to satisfy the definition of the analysis setting data213. Hereinafter, the procedure will be described.

FIG. 10 shows an evaluation flow of models (weak recognizers)constituting the ensemble model. In the case of the ensemble model 300shown in FIG. 3, the evaluation flow shown in FIG. 10 is executed foreach of the models 1 to 6 included in the healthy walking model group301, the first abnormal walking model group 302, and the second abnormalwalking model group 303. This evaluation flow is performed each timelearning is performed on each model. For example, each time the dataanalysis apparatus 100 learns a model, it is desirable to execute theevaluation flow of FIG. 10 and store an evaluation result together withthe model.

An analyst 1000 performs a model evaluation start operation on the userinput-output processing unit 201 (S1001). First, the user input-outputprocessing unit 201 issues a feature amount calculation request to thefeature amount calculation processing unit 203 (S1002). The featureamount calculation processing unit 203 receives inputs of themeasurement data 211 stored in the storage 105 (S1003), and calculatestotal feature amount data 220 (S1004). Any measurement data may be usedas the measurement data 211, and for example, measurement data used forlearning a model may be used. The total feature amount data 220 includesall feature amounts used by a model (weak recognizer) that is an optionof the ensemble model to be evaluated. When the total feature amountdata 220 is calculated, the user input-output processing unit 201 issuesa model evaluation request to the model evaluation unit 204 (S1005). Themodel evaluation unit 204 receives an input of the total feature amountdata 220 (S1006), executes evaluation of each model, and stores modeldata 215 including an evaluation result in the storage 105 (S1007). FIG.11 shows a data structure of the model data 215.

A model ID 2151 is an ID for specifying each of the models (weakrecognizers) constituting the ensemble model. An algorithm used in eachmodel is stored in an algorithm 2152, an object variable (for example,healthy walking, abnormal walking 1, and abnormal walking 2 in theexample of FIG. 3) of the model is stored in an object variable 2153,and binary data of the model is stored in model data 2154. Resultsevaluated by the model evaluation unit 204 are stored in a processingspeed 2155 and a performance index 2156. The processing speed 2155indicates time from when a feature amount is input to each model to whena recognition result is output. The performance index 2156 stores anevaluation result for each performance index (performance indexappearing in the analysis setting data 213) used to define theperformance requirement of the ensemble model.

FIG. 12 shows a flow for selecting a model (weak recognizer) to be usedfor the ensemble model and selecting a feature amount to be calculated.The flow of FIG. 12 is preferably performed by an information processingapparatus that performs actual analysis, that is, the PC 600 thatexecutes the processing flow of FIG. 6 in the present embodiment. Thetime required for calculating the feature amount and recognizing by themodel differs depending on calculation performance and a state of theinformation processing apparatus. Therefore, it is possible to improvereliability of pre-evaluation results of the performance and constraintsof the ensemble model by constructing the ensemble model and selectingthe feature amount to be calculated with the information processingapparatus that performs the actual analysis.

The analyst 1000 performs an ensemble model creation operation on theuser input-output processing unit 201 (S1201). The user input-outputprocessing unit 201 issues an ensemble model creation request to theensemble model creation processing unit 207 (S1202). The ensemble modelcreation processing unit 207 receives inputs of the analysis settingdata 213, the measurement data 211, the model data 215, and the domainknowledge data 217 stored in the storage 105 (S1203 to S1206), createsthe ensemble model data 218 for specifying the models constituting theensemble model satisfying the predetermined performance requirements andconstraint requirements and the selected feature amount data 216 forspecifying a feature amount required to be calculated for the ensemblemodel, and stores the ensemble model data 218 and the selected featureamount data 216 in the storage 105 (S1207 to S1208).

FIG. 13 shows an ensemble model determination flow executed by theensemble model creation processing unit 207. First, selection of ananalysis target (measurement subject group) using the PC 600 is received(S1301). By collating the input analysis target (measurement subjectgroup) with the analysis setting data 213, it is possible to obtain theperformance requirements and the constraint requirements required forthe ensemble model.

Subsequently, a candidate model (weak recognizer) to be used for theensemble model is selected (S1302). The candidate model to be used forthe ensemble model is selected based on the processing speed 2155 andthe performance index 2156 stored in the model data 215. In theselection, a candidate model is selected so that a performance indexspecified as a performance requirement is highest. In this case, aplurality of candidates may be selected.

Subsequently, for the ensemble model to which the selected candidatemodel (weak recognizer) is applied, performance and an analysis time ofan actual machine are evaluated (S1303). In performance evaluation, theperformance index specified as the performance requirement iscalculated. The evaluated analysis time includes time required tocalculate the feature amount data from the measurement data and timerequired to perform analysis by the ensemble model from the featureamount data. A calculation time of the feature amount data is timerequired to calculate a feature amount necessary for analysis by theensemble model constituted by the candidate model. Since the processingspeed 2155 stored in the model data 215 is not limited to the processingspeed evaluated by the PC 600, it is possible to estimate a moreaccurate time required for the analysis by the ensemble model by the PC600 performing the analysis from the actual measurement data 211. Themeasurement data used for an analysis time evaluation may be themeasurement data used for learning the model, measurement data measuredby the PC 600 in the past, or any measurement data.

When the analysis time evaluation by the actual machine (S1303)satisfies the time constraint 2135 (see FIG. 4) of the analysis target(yes in S1304), performance information on the ensemble model using theselected model (weak recognizer) is displayed on a display or the like(S1307). When there are a plurality of candidates, each candidate isdisplayed together with the performance information. The analyst 1000checks the performance information and determines a model (weakrecognizer) to be used for the ensemble model from the presented modelcandidates (S1308).

When the analysis time evaluation by the actual machine (S1303) does notsatisfy the time constraint 2135 of the analysis target (no in S1304), amodel candidate is selected so that the performance index specified asthe performance requirement is as high as possible based on a deviationbetween the performance index specified as the performance requirement,the analysis time evaluated in S1303, and the time constraint as theconstraint requirement (S1305).

At this time, a model candidate is selected so that the feature amountto be calculated is limited based on the deviation between theimportance degree of the feature amount, the analysis time evaluated inS1303, and the time constraint that is the constraint requirement(S1306). As the importance degree of the feature amount, both animportance degree in an analysis algorithm and an importance degree in adescription of an analysis result to the measurement subject areconsidered. The importance degree in an analysis algorithm can bedetermined from the binary data of the model data 2154, and theimportance degree in a description of an analysis result to themeasurement subject can be determined from the domain knowledge data217. Regarding at least one model constituting the ensemble model, byomitting the calculation of the feature amount having a small influenceon the analysis result or the description thereof (this state isreferred to as an “input constrained state”), it is possible to expectthat the time required for the calculation of the feature amount isreduced while preventing the decline of the performance as much aspossible. Also in S1305 and S1306, a plurality of candidates may beselected.

The performance and the analysis time of the actual machine areevaluated again (S1303) based on the selected model candidate and afeature amount candidate, and the selection and the ensemble modelevaluation by the actual machine are repeated while changing acombination of models (weak recognizers) constituting the ensemble modeland the selection of the feature amount until the model candidate andthe feature amount candidate satisfying the time constraint areobtained.

FIG. 14 shows a data structure of the ensemble model data 218 output bythe ensemble model creation processing unit 207. For each model (weakrecognizer) registered as the model data 215, adoption/non-adoption forthe ensemble model is registered.

FIG. 15 shows a data structure of the selected feature amount data 216output by the ensemble model creation processing unit 207. For each ofthe feature amounts that can be calculated by the data analysisapparatus 100, adoption/non-adoption for the ensemble model isregistered.

While the invention made by the present inventor has been specificallydescribed based on the embodiment, the invention is not limited thereto,and various modifications may be made without departing from the scopeof the invention. In the embodiment, a walking mode analysis apparatusthat analyzes the walking mode of the measurement subject has beendescribed as an example, and the invention is widely applicable to anapparatus, a system, a method, and a program that perform data analysisusing an ensemble model.

REFERENCE SIGN LIST

-   -   100 data analysis apparatus    -   101 CPU    -   102 input I/F    -   103 output I/F    -   104 memory    -   105 storage    -   106 I/O port    -   107 internal bus    -   110 data analysis system    -   111 sensor    -   200 data analysis program    -   201 user input-output processing unit    -   202 data measurement processing unit    -   203 feature amount calculation processing unit    -   204 model evaluation unit    -   205 ensemble analysis processing unit    -   206 analysis setting processing unit    -   207 ensemble model creation processing unit    -   210 database program    -   211 measurement data    -   212 feature amount data    -   213 analysis setting data    -   214 prediction result data    -   215 model data    -   216 selected feature amount data    -   217 domain knowledge data    -   218 ensemble model data    -   220 total feature amount data    -   300 ensemble model    -   301 healthy walking model group    -   302 first abnormal walking model group    -   303 second abnormal walking model group

1. A data analysis apparatus that performs data analysis using anensemble model that makes an inference by integrating inferences byfirst to n-th models, the data analysis apparatus comprising: aprocessor; a memory; a storage; and a data analysis program read intothe memory and executed by the processor, wherein the storage storesmodel data in which first to n-th model groups each including one ormore models are registered, an i-th model (1≤i≤n) constituting theensemble model is selected from an i-th model group of the model data,at least one model group of the first to n-th model groups includes aplurality of models, and the data analysis program includes: an ensemblemodel creation processing unit configured to present, from therespective first to n-th model groups, options of the first to n-thmodels capable of constituting an ensemble model satisfying aperformance requirement for data analysis and a constraint requirementfor time required for the data analysis; and an ensemble analysisprocessing unit configured to receive selection of the presented optionsof the first to n-th models and make an inference by the ensemble modelusing the selected first to n-th models.
 2. The data analysis apparatusaccording to claim 1, wherein the storage stores analysis setting datafor setting the performance requirement and the constraint requirementfor each of a plurality of analysis targets, and the ensemble modelcreation processing unit presents the options of the first to n-thmodels capable of constituting the ensemble model satisfying theperformance requirement and the constraint requirement of an analysistarget that is a target of the data analysis among the plurality ofanalysis targets.
 3. The data analysis apparatus according to claim 2,wherein the performance requirement is defined by a performance indexand a threshold of the performance index, and an index corresponding tothe plurality of analysis targets is set as the performance index. 4.The data analysis apparatus according to claim 2, wherein the constraintrequirement is provided as an upper limit of an analysis time includingtime required to calculate feature amount data from measurement data andtime required to perform the data analysis using the ensemble model fromthe feature amount data.
 5. The data analysis apparatus according toclaim 4, wherein the ensemble model creation processing unit makes aninference in an input constrained state in which a feature amount inputto at least one model is selected among the presented options of thefirst to n-th models, and presents the options of the first to n-thmodels capable of constituting the ensemble model satisfying theperformance requirement and the constraint requirement in the inputconstrained state.
 6. The data analysis apparatus according to claim 5,wherein the storage has domain knowledge data indicating importance of afeature amount in the analysis target that is a target of the dataanalysis, and the ensemble model creation processing unit selects afeature amount input to an ensemble model in the input constrained statebased on the domain knowledge data and the importance of the featureamount in a model.
 7. The data analysis apparatus according to claim 5,wherein the ensemble model creation processing unit receives selectionof the options of the presented first to n-th models, and stores, in thestorage, ensemble model data for specifying the first to n-th modelsused in an ensemble model used by the ensemble analysis processing unit,and selected feature amount data for specifying a feature amountselected as a feature amount input to the ensemble model used by theensemble analysis processing unit.
 8. The data analysis apparatusaccording to claim 7, wherein the data analysis program further includesa feature amount calculation processing unit configured to calculatefeature amount data from measurement data, the feature amountcalculation processing unit calculates the feature amount data from themeasurement data for a feature amount specified in the selected featureamount data, and the ensemble analysis processing unit makes aninference by inputting the feature amount data calculated by the featureamount calculation processing unit to the ensemble model using the firstto n-th models specified in the ensemble model data.
 9. The dataanalysis apparatus according to claim 8, wherein the feature amountcalculation processing unit calculates the feature amount data from themeasurement data for a feature amount not specified in the selectedfeature amount data in a predetermined time zone.
 10. The data analysisapparatus according to claim 9, wherein the feature amount datacalculated by the feature amount calculation processing unit from themeasurement data is used for learning of a model stored in the storage.11. A data analysis method for performing data analysis using anensemble model that makes an inference by integrating inferences byfirst to n-th models, the data analysis method comprising: storing inadvance model data in which first to n-th model groups each includingone or more models are registered, an i-th model (1≤i≤n) constitutingthe ensemble model being selected from an i-th model group of the modeldata, at least one model group of the first to n-th model groupsincluding a plurality of models; presenting, from the respective firstto n-th model groups, options of the first to n-th models capable ofconstituting an ensemble model satisfying a performance requirement forthe data analysis and a constraint requirement for time required for thedata analysis; and receiving selection of the presented options of thefirst to n-th models and making an inference by the ensemble model usingselected first to n-th models.
 12. The data analysis method according toclaim 11, further comprising: making an inference in an inputconstrained state in which a feature amount input to at least one modelis selected among the presented options of the first to n-th models; andpresenting the options of the first to n-th models capable ofconstituting the ensemble model satisfying the performance requirementand the constraint requirement in the input constrained state.
 13. Thedata analysis method according to claim 12, further comprising:calculating first feature amount data from measurement data for a firstfeature amount input to an ensemble model that preforms data analysis;and making an inference by inputting the calculated first feature amountdata to the ensemble model that preforms the data analysis.
 14. The dataanalysis method according to claim 13, further comprising: calculatingsecond feature amount data from the measurement data for a secondfeature amount other than the first feature amount in a predeterminedtime zone.
 15. A data analysis program that performs data analysis usingan ensemble model that makes an inference by integrating inferences byfirst to n-th models on an information processing apparatus that storesmodel data in which first to n-th model groups each including one ormore models are registered, wherein an i-th model (1≤i≤n) constitutingthe ensemble model is selected from an i-th model group of the modeldata, at least one model group of the first to n-th model groupsincludes a plurality of models, and the data analysis program comprises:a first step of presenting, from the respective first to n-th modelgroups, options of the first to n-th models capable of constituting anensemble model satisfying a performance requirement for the dataanalysis and a constraint requirement for time required for the dataanalysis; and a second step of receiving selection of the presentedoptions of the first to n-th models and making an inference by theensemble model using selected first to n-th models.